Skip to content

Prefer VersionStore::store_version_from_reader#405

Merged
CleanCut merged 17 commits intomainfrom
cleancut/rm-store-version-from-path
Apr 3, 2026
Merged

Prefer VersionStore::store_version_from_reader#405
CleanCut merged 17 commits intomainfrom
cleancut/rm-store-version-from-path

Conversation

@CleanCut
Copy link
Copy Markdown
Contributor

This PR removes VersionStore::store_version_from_path (and its implementations) in favor of using VersionStore::store_version_from_reader. This makes it more straightforward to maintain the VersionStore trait and its various implementations, and makes it clearer how file bytes are traveling through the various calls.

…producer task errors out--otherwise we may attempt to begin or continue uploads after sending abort_multipart_upload, which would be weird
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Added repository guidelines for async IO in Rust code.
    • Added "Large File Support" roadmap with planned implementation phases.
  • Refactor

    • Refactored file storage operations to use async reader-based approach.
    • Standardized file ingestion across multiple storage operations for improved scalability.

Walkthrough

This PR removes the synchronous path-based store_version_from_path API from the VersionStore trait and migrates all call sites to use the async reader-based store_version_from_reader method instead. Files are now read asynchronously, with content passed via readers and explicit size parameters, aligning with the codebase's shift to async IO practices.

Changes

Cohort / File(s) Summary
Documentation Updates
.claude/CLAUDE.md, TODO.md
Added async IO guideline for Rust code and new "Large File Support" section describing a three-phase plan for handling large files with streaming and chunking.
Call Site Migrations
crates/lib/src/api/client/entries.rs, crates/lib/src/core/v_latest/add.rs, crates/lib/src/core/v_latest/branches.rs, crates/lib/src/core/v_latest/workspaces/files.rs, crates/lib/src/repositories/remote_mode/checkout.rs
Updated version storage calls from store_version_from_path(hash, path) to async reader-based store_version_from_reader(hash, Box::new(reader), size), including async file opening and metadata retrieval at each call site.
API Removal
crates/lib/src/storage/version_store.rs, crates/lib/src/storage/local.rs, crates/lib/src/storage/s3.rs
Removed store_version_from_path method declaration from the VersionStore trait and its implementations in LocalVersionStore and S3VersionStore. Removed sync file-loading logic from S3 implementation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Implement S3VersionStore::store_version_from_reader #398: Implements and modifies S3VersionStore::store_version_from_reader signature and streaming behavior, directly related to the reader-based API now being adopted across call sites.
  • Tests for S3VersionStore::store_version_from_reader #404: Adds test helpers and implements store_version_from_reader for S3VersionStore, directly paired with this PR's migration of callers to that method.
  • Oxen-AI/Oxen#647: Modifies process_add_dir in the same file (core/v_latest/add.rs) to refactor add logic, creating potential conflicts or dependencies with these storage call changes.

Suggested reviewers

  • malcolmgreaves
  • jcelliott
  • rpschoenburg

Poem

🐰 A hop from paths to readers swift,
Async files on Tokio's lift,
No blocking here, just streams that flow,
The storage layer steals the show! 📚✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and precisely describes the main change: removing store_version_from_path in favor of store_version_from_reader.
Description check ✅ Passed The description accurately explains the change and its rationale, detailing both the removal of the old method and the benefits of using the reader-based approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cleancut/rm-store-version-from-path

Comment @coderabbitai help to get the list of available commands and usage tips.

…3 without multipart upload shenanigans" because it fails at runtime without a size

This reverts commit 7690d04.
…t; use file size to determine part size for multipart uploads for files > 100MB
@CleanCut CleanCut force-pushed the cleancut/s3-tests branch from e689414 to 16d3337 Compare April 1, 2026 03:54
@CleanCut CleanCut force-pushed the cleancut/rm-store-version-from-path branch 2 times, most recently from 510937a to 816bdf6 Compare April 1, 2026 15:36
@CleanCut CleanCut force-pushed the cleancut/s3-tests branch from 1ac0ab1 to 58091e1 Compare April 1, 2026 15:46
@CleanCut CleanCut force-pushed the cleancut/rm-store-version-from-path branch 2 times, most recently from 792882a to a7c773f Compare April 1, 2026 15:48
@CleanCut CleanCut force-pushed the cleancut/s3-tests branch from 6cc41e0 to f2c6737 Compare April 1, 2026 17:13
…in favor of using VersionStore::store_version_from_reader
@CleanCut CleanCut force-pushed the cleancut/rm-store-version-from-path branch from a7c773f to 20dd721 Compare April 1, 2026 17:15
Base automatically changed from cleancut/s3-tests to main April 2, 2026 02:42
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/lib/src/api/client/entries.rs (1)

242-249: Consider streaming from disk instead of re-buffering in memory.

The current implementation reads the entire file into memory with tokio::fs::read, computes the hash, then passes the same bytes via Cursor to store_version_from_reader. For large files, this keeps the full content in memory.

Since the hash computation requires the full content, the memory usage isn't a regression from before. However, for consistency with other call sites and better memory efficiency on large files, consider refactoring to:

  1. Compute hash during download (streaming hash)
  2. Then stream from disk to version store using tokio::fs::File + BufReader

This would avoid holding the full file in memory after hashing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/api/client/entries.rs` around lines 242 - 249, The code
currently reads the whole file into memory via tokio::fs::read, computes the
hash with util::hasher::hash_buffer and then constructs a Cursor to call
version_store.store_version_from_reader, which keeps the full buffer in memory;
change this to compute the hash while streaming (e.g., open the file with
tokio::fs::File and feed bytes through a streaming hasher during download or
read), then reopen or seek the file and pass a tokio::fs::File wrapped in
tokio::io::BufReader (or an async Read) into
version_store.store_version_from_reader along with the computed hash and size so
the full content is not re-buffered in memory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/lib/src/api/client/entries.rs`:
- Around line 242-249: The code currently reads the whole file into memory via
tokio::fs::read, computes the hash with util::hasher::hash_buffer and then
constructs a Cursor to call version_store.store_version_from_reader, which keeps
the full buffer in memory; change this to compute the hash while streaming
(e.g., open the file with tokio::fs::File and feed bytes through a streaming
hasher during download or read), then reopen or seek the file and pass a
tokio::fs::File wrapped in tokio::io::BufReader (or an async Read) into
version_store.store_version_from_reader along with the computed hash and size so
the full content is not re-buffered in memory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6bd41b51-0e02-40c9-8f84-314469b879ad

📥 Commits

Reviewing files that changed from the base of the PR and between d378f82 and 4328f2e.

📒 Files selected for processing (10)
  • .claude/CLAUDE.md
  • TODO.md
  • crates/lib/src/api/client/entries.rs
  • crates/lib/src/core/v_latest/add.rs
  • crates/lib/src/core/v_latest/branches.rs
  • crates/lib/src/core/v_latest/workspaces/files.rs
  • crates/lib/src/repositories/remote_mode/checkout.rs
  • crates/lib/src/storage/local.rs
  • crates/lib/src/storage/s3.rs
  • crates/lib/src/storage/version_store.rs
💤 Files with no reviewable changes (3)
  • crates/lib/src/storage/local.rs
  • crates/lib/src/storage/s3.rs
  • crates/lib/src/storage/version_store.rs

@CleanCut CleanCut merged commit bfa09f0 into main Apr 3, 2026
5 checks passed
@CleanCut CleanCut deleted the cleancut/rm-store-version-from-path branch April 3, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants